Skip to content

Validate config_db.json in reload path.#4521

Open
dypet wants to merge 5 commits into
sonic-net:masterfrom
dypet:config_reload_validate
Open

Validate config_db.json in reload path.#4521
dypet wants to merge 5 commits into
sonic-net:masterfrom
dypet:config_reload_validate

Conversation

@dypet

@dypet dypet commented May 6, 2026

Copy link
Copy Markdown

What I did

Fix for sonic-net/sonic-buildimage#27178 . /etc/sonic/config_db.json was not validated in the config reload path before loading. If a config that did not conform to schema was partially loaded, the system could end up in a bad unrecoverable state.

How I did it

added a config_file_yang_validation check to config_db.json in config reload path.

How to verify it

'config reload -y' after copying an invalid config to /etc/sonic/config_db.json.

Previous command output

:~$ sudo config reload -y
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Traceback (most recent call last):
  File "/usr/local/bin/sonic-cfggen", line 542, in <module>
    main()
    ~~~~^^
  File "/usr/local/bin/sonic-cfggen", line 531, in main
    configdb.mod_config(FormatConverter.output_to_db(data))
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2862, in mod_config
    raw_data = self.typed_to_raw(data)
  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2797, in typed_to_raw
    elif len(typed_data) == 0:
         ~~~^^^^^^^^^^^^
TypeError: object of type 'int' has no len()
Released lock on /etc/sonic/reload.lock

Config is partially loaded in this case, leaving system in state that needs to be recovered with

sudo cp config_db.json_working /etc/sonic/config_db.json
sudo sonic-db-cli CONFIG_DB SET CONFIG_DB_INITIALIZED 1
sudo config reload -y -f

New command output

:~$ sudo config reload -y
Acquired lock on /etc/sonic/reload.lock
sonic_yang(3):All Keys are not parsed in DASH_HA_GLOBAL_CONFIG
dict_keys(['cp_data_channel_port', 'dp_channel_dst_port', 'dp_channel_src_port_min', 'dp_channel_src_port_max', 'dp_channel_probe_interval_ms', 'dp_channel_probe_fail_threshold', 'dpu_bfd_probe_interval_in_ms', 'dpu_bfd_probe_multiplier', 'dpu_vnet', 'dpu_vlan'])
sonic_yang(3):exceptionList:[]
sonic_yang(3):Data Loading Failed:All Keys are not parsed in DASH_HA_GLOBAL_CONFIG
dict_keys(['cp_data_channel_port', 'dp_channel_dst_port', 'dp_channel_src_port_min', 'dp_channel_src_port_max', 'dp_channel_probe_interval_ms', 'dp_channel_probe_fail_threshold', 'dpu_bfd_probe_interval_in_ms', 'dpu_bfd_probe_multiplier', 'dpu_vnet', 'dpu_vlan'])
exceptionList:[]
/etc/sonic/config_db.json fails YANG validation! Error: Data Loading Failed
All Keys are not parsed in DASH_HA_GLOBAL_CONFIG
dict_keys(['cp_data_channel_port', 'dp_channel_dst_port', 'dp_channel_src_port_min', 'dp_channel_src_port_max', 'dp_channel_probe_interval_ms', 'dp_channel_probe_fail_threshold', 'dpu_bfd_probe_interval_in_ms', 'dpu_bfd_probe_multiplier', 'dpu_vnet', 'dpu_vlan'])
exceptionList:[]
Released lock on /etc/sonic/reload.lock
Aborted!

The incorrect config is not loaded in this case and system stays up.

Signed-off-by: dypet <dypeters@cisco.com>
@dypet dypet requested a review from zjswhhh May 6, 2026 15:14
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: dypet <dypeters@cisco.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

zjswhhh
zjswhhh previously approved these changes May 7, 2026
@zjswhhh

zjswhhh commented May 8, 2026

Copy link
Copy Markdown
Contributor

Hi @qiluo-msft - please help review and merge.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds YANG validation for the default /etc/sonic/config_db.json during the config reload path to prevent invalid config loads from partially applying and leaving the system in an unrecoverable state (per sonic-buildimage#27178).

Changes:

  • Validate the default CONFIG_DB JSON (DEFAULT_CONFIG_DB_FILE) during config reload when no filename is provided.
  • Extend unit tests to assert that validation is invoked and that reload aborts early when validation fails.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
config/main.py Adds a default-config validation step in the reload path before stopping services / flushing CONFIG_DB.
tests/config_test.py Updates reload tests to assert validation is called; adds a failure-path test to ensure reload aborts before executing commands.

Comment thread config/main.py
Comment on lines +2164 to +2167
if filename is None and file_format == 'config_db':
# Validate the default config before stopping services and flushing CONFIG_DB.
config_file_yang_validation(DEFAULT_CONFIG_DB_FILE)
elif filename is not None and filename != "/dev/stdin":

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread config/main.py Outdated
Comment on lines +2165 to +2166
# Validate the default config before stopping services and flushing CONFIG_DB.
config_file_yang_validation(DEFAULT_CONFIG_DB_FILE)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that multi-ASIC mode fix is missing. @dypet could you check?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread config/main.py Outdated
Comment on lines +2164 to +2166
if filename is None and file_format == 'config_db':
# Validate the default config before stopping services and flushing CONFIG_DB.
config_file_yang_validation(DEFAULT_CONFIG_DB_FILE)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with this finding.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for file existence

Comment thread tests/config_test.py
@qiluo-msft

Copy link
Copy Markdown
Contributor

A design question: should we skip yang validation if config reload -f? If today it is not validating yang, we may still keep it for one release branch, and enforce it later.

Signed-off-by: dypet <dypeters@cisco.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: dypet <dypeters@cisco.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: dypet <dypeters@cisco.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@dypet dypet requested review from qiluo-msft and zjswhhh May 22, 2026 14:01

@zjswhhh zjswhhh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dypet

dypet commented Jun 2, 2026

Copy link
Copy Markdown
Author

Hi @qiluo-msft , please help merge

@zjswhhh

zjswhhh commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Hi @qiluo-msft - please help merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants